Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops] Event log aggregations #126948

Merged
merged 28 commits into from
Mar 10, 2022

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Mar 4, 2022

Towards #125645

Summary

  • adds aggregation to event log client plugin API
  • does not add HTTP API for aggregations
  • updates the event log find API to take in an array of sort values instead of a single field (request from [Event Log] Add support for aggregations #125645)

This PR will be used to support adding an execution log API to the rules client but nothing uses the event log aggregation yet in this PR.

Checklist

@ymao1 ymao1 added the skip-ci label Mar 4, 2022
@ymao1 ymao1 removed the skip-ci label Mar 7, 2022
@ymao1 ymao1 changed the title wip [Response Ops] Event log aggregations Mar 7, 2022
@ymao1 ymao1 self-assigned this Mar 8, 2022
@ymao1 ymao1 added backport:skip This commit does not require backporting Feature:EventLog release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.2.0 labels Mar 8, 2022
@ymao1 ymao1 marked this pull request as ready for review March 8, 2022 17:27
@ymao1 ymao1 requested review from a team as code owners March 8, 2022 17:27
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@ymao1
Copy link
Contributor Author

ymao1 commented Mar 8, 2022

@elasticmachine merge upstream

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left a nit comment about the aggs query result property name (names!).

Thanks for pushing this over the line!!!!

The biggest benefactor of this will hopefully be the get alert status code, that currently does an in-JS aggregation. Will be a good test of this API, as well as if we structured the event log docs good enough to be able to get that data via aggs ...

@ymao1
Copy link
Contributor Author

ymao1 commented Mar 9, 2022

@elasticmachine merge upstream

@ymao1
Copy link
Contributor Author

ymao1 commented Mar 9, 2022

@elasticmachine merge upstream

Copy link
Member

@spong spong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked out, rebased the in-memory aggs POC (PR) against these changes and started testing aggs queries. Still finalizing the query we'd use on the security side, but I don't see anything that would need to change with how aggs were exposed from the event-log client. LGTM! Thanks for making this happen for 8.2 @ymao1! 🙂 🙌 🚀

Copy link
Contributor

@ersin-erdal ersin-erdal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
eventLog 82 91 +9

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
eventLog 6 9 +3
Unknown metric groups

API count

id before after diff
eventLog 82 91 +9

ESLint disabled line counts

id before after diff
eventLog 5 4 -1

Total ESLint disabled count

id before after diff
eventLog 5 4 -1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 merged commit 9e61f35 into elastic:main Mar 10, 2022
@ymao1 ymao1 deleted the alerting/event-log-aggregate branch March 10, 2022 17:38
spong added a commit that referenced this pull request Mar 28, 2022
## Summary

Resolves #119598, #119599, #101014

Test plan ([internal doc](https://docs.google.com/document/d/1-prIUGYaPHiwGA79CgSdw1926lxIPKGWWkYOUD2BM1U/edit#heading=h.womzsfdt6zt8))

Adds `Rule Execution Log` table to Rule Details page:

<p align="center">
  <img width="700" src="https://user-images.githubusercontent.com/2946766/158540840-e9cddb9b-f33d-4b95-86ad-cb3e0a00cf39.gif" />
</p>


### Implementation notes

The useful metrics within `event-log` for a given rule execution are spread between a few different platform (`execute-start`, `execute`) and security (`execution-metrics`, `status-change`) events. In effort to provide consolidated metrics per rule execution (and avoiding a lot of empty cells and mis-matched statuses like in the image below)

<p align="center">
  <img width="700" src="https://user-images.githubusercontent.com/2946766/151933881-2e58f4d7-4cda-4528-9d44-37cb7bd5de9c.png" />
</p>



these rule execution events are aggregated by their `executionId`, and then fields are merged from each different event. This PR was re-worked to take advantage of the new event-log aggregation support added in #126948, and is no longer implemented as an in-memory aggregation server side.

* Due to restrictions around supplying search filters that may match multiple sub-agg buckets and missing data ([see discussion here](https://github.com/elastic/kibana/pull/127339/files#r825240516)), it was decided that we'd disable the search bar for the time being. We have both a near-term (writing single rollup event) and long-term (ES|QL) solution that will allow us to re-enable this functionality.

* Note, since a `terms` agg is used to fetch all execution events, an upper bound must be set. See [this discussion](https://github.com/elastic/kibana/pull/127339/files#r823035420) for more details, but setting this max to `1000` events for the time being, and returning total cardinality of execution events back within `total` to allow the UI to inform the user that they should narrow their search further to better isolate and find possible issues. This should be a be a reasonable constraint for most all rules as a rule executing every 5 minutes, 1000 executions would cover over 3 days of execution time.

<p align="center">
  <img width="700" src="https://user-images.githubusercontent.com/2946766/159045563-966896b4-3cd1-475d-9f0e-c2d300683546.png" />
</p>


The `Filter for alerts` action will be available on all `Succeeded`/`Partial Failure` executions even if there weren't alerts generated until #126210 is merged and we can start returning the alert count, at which point we can programmatically enabled/disable this action based on alert count.



<p align="center">
  <img width="300" src="https://user-images.githubusercontent.com/2946766/159051762-e2f97ba4-4ce1-4f67-8ae1-395e4b191cab.png" />
</p>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:EventLog release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants